Entorhinal cortex epigenome-wide association study highlights four novel loci showing differential methylation in Alzheimer’s disease

Background Studies on DNA methylation (DNAm) in Alzheimer’s disease (AD) have recently highlighted several genomic loci showing association with disease onset and progression. Methods Here, we conducted an epigenome-wide association study (EWAS) using DNAm profiles in entorhinal cortex (EC) from 149 AD patients and control brains and combined these with two previously published EC datasets by meta-analysis (total n = 337). Results We identified 12 cytosine-phosphate-guanine (CpG) sites showing epigenome-wide significant association with either case–control status or Braak’s tau-staging. Four of these CpGs, located in proximity to CNFN/LIPE, TENT5A, PALD1/PRF1, and DIRAS1, represent novel findings. Integrating DNAm levels with RNA sequencing-based mRNA expression data generated in the same individuals showed significant DNAm-mRNA correlations for 6 of the 12 significant CpGs. Lastly, by calculating rates of epigenetic age acceleration using two recently proposed “epigenetic clock” estimators we found a significant association with accelerated epigenetic aging in the brains of AD patients vs. controls. Conclusion In summary, our study represents the hitherto most comprehensive EWAS in AD using EC and highlights several novel differentially methylated loci with potential effects on gene expression. Supplementary Information The online version contains supplementary material available at 10.1186/s13195-023-01232-7.


Background
Alzheimer's disease (AD) is a progressive, neurodegenerative disease that accounts for 50-60% of all dementia cases [1]. The number of AD cases is increasing and it was recently estimated that nearly 44 million individuals lived with dementia in 2016 world-wide [2]. On a neuropathological level, the hallmarks of AD are accumulations of amyloid-beta (Aβ) plaques and neurofibrillary tangles (NFTs) consisting of hyperphosphorylated tau protein.
There is growing evidence that the first neuropathological changes already occur two decades or more prior to the onset of clinical symptoms [3]. In early AD, NFTs are regularly observed without the formation of Aβ, with neuropathological changes typically starting in the transentorhinal followed by the entorhinal cortex (EC) before spreading across most cortical brain regions while the disease progresses [4]. Owing to this spatio-temporal course, the EC represents an interesting and informative brain region to study in molecular AD research, including studies aimed at the epigenome or transcriptome.
In this study, we generated DNAm (using the Methyla-tionEPIC microarray) and mRNA (using RNA sequencing) expression profiles in the same EC slices from 65 AD cases and 84 control brains. These data were used to conduct a DNAm-based EWAS using both case-control status and Braak's tau-staging (henceforth termed "Braak staging") as predictors. For the EWAS part, we combined our DNAm data with data from two previously published EC studies (both generated using the 450 K Methylation microarray) [10,18] increasing our total sample size to n = 337. Significantly differentially methylated sites were then correlated with corresponding mRNA levels to probe for potential effects of DNAm on gene expression.

Human samples
Snap-frozen, post-mortem human brain tissue from EC slices (Brodmann area BA28) from 91 AD patients and 92 elderly control individuals was obtained from the Oxford Brain Bank. The AD patients and healthy controls were part of the longitudinal, prospective Oxford Project to Investigate Memory and Aging (OPTIMA) using protocols which have been described in detail elsewhere [19]. All subjects underwent a detailed clinical history, physical examination, assessment of cognitive function (Cambridge Examination of Mental Disorders of the Elderly (CAMDEX) [20] with the Cambridge Cognitive Examination (CAMCOG) and Mini-Mental State Examination (MMSE) biannually. The pathological diagnosis of AD was made using the Consortium to Establish a Registry for Alzheimer's disease (CERAD)/National Institutes of Health (NIH) criteria and Braak staging [21][22][23]. Post-mortem interval (PMI) was 54 h on average for all included samples [range ). All included patients were of white European descent by self-report. The Ethics Committees of Oxford University and University of Lübeck approved the use of the human tissues for our study and all participants gave informed consent and the study was conducted in accordance with the Declaration of Helsinki. Details regarding the DNA and RNA extraction, as well as the procedures for DNAm profiling using the "Infinium MethylationEPIC" array (EPIC; Illumina, Inc.), RNA sequencing, and quality control (QC) can be found in the supplementary methods. The EPIC array is the successor to the widely used "Infinium HumanMeth-ylation450" kit; the high precision and reproducibility of the DNAm data generated with the EPIC array has been validated in a large number of independent reports [24][25][26]. A detailed sample description can be found in Table 1.

Epigenome-wide association study (EWAS) analyses to identify differentially methylated probes (DMPs) and differentially methylated regions (DMRs)
Statistical analyses to identify differentially methylated probes (DMPs) were performed based on linear regression models using the lm function in R using case-control status (as dichotomous variable) or Braak stage (as continuous variable) as predictor in the EWAS, respectively: (1) DNAm(case − control) ∼ ADstatus + age + sex + DNAm PCs + genetic PCs (2) DNAm(Braak staging) ∼ Braak stage + age + sex + DNAm PCs + genetic PCs To account for differences in the DNAm profiles due to technical (e.g. laboratory batch, array) and other (e.g. cell-type composition of samples, genetic ancestry) factors we implemented an elaborate protocol of batch effect correction of known and unknown confounders using principal components from a PCA on the genome-wide SNP and DNAm data (see Supplement for full details). To account for multiple testing, we set the study-wide alpha to 7.51E-08 to adjust for 665,796 analyzed CpGs in this arm of our study. Differentially methylated regions (DMRs, i.e. combinations of consecutive DMPs) were assessed with the comb-p tool [27] with the maximal gap within a region set to 500 base pairs and the seed p-value set to 1.00E-03. Only regions including at least three cytosine-phosphateguanine (CpG) sites were considered. Significance of the DMR results was determined using the Sidak method (as implemented in comb-p).
Annotation of CpGs to specific gene regions was based on the Illumina manifest (v1.0 B5) for the EPIC array and the GREAT annotation tool [28]. DNAm-mRNA correlation analyses (see below) were performed for all genes annotated to a specific CpG showing association per EWAS.

Meta-analysis of epigenome-wide association study (EWAS) results
To increase power of our EWAS, we combined our EPIC array-based results with those from two publicly available AD EC datasets (GEO accession numbers GSE59685 with 58 AD cases and 21 controls ["London-1"]; and GSE105109 with 68 AD cases and 28 controls ["London-2"]). The descriptions of these datasets can be found in the primary publications [6,10]. Here, we downloaded the processed DNAm values and repeated EWAS analyses for Braak stage and AD case-control status using the same linear regression models as described above. These regression models included 13 DNAm PCs for the Braak stage analysis of GSE59685, and 15 DNAm PCs for the remaining analyses. The meta-analysis was conducted with a fixed-effect inverse-variance approach using the function metagen in the R package "meta" [29]. To account for multiple testing, we set the study-wide alpha to 1.64E-07 to adjust for 304,996 meta-analyzed CpGs in this arm of our study.

Alzheimer's disease poly-epigenetic scores
To assess the correspondence of our novel DNAm data to previous EWAS on the topic, poly-epigenetic scores (PES) for each individual were calculated based on the test statistics from two publicly available AD EC datasets (GEO accession numbers GSE59685; GSE105109). To this end, we combined uncorrelated CpGs into one aggregated DNAm variable and tested these as predictors in regression models analogous to the primary EWAS. For more details see Supplementary Material.

Epigenetic age estimation
Two epigenetic age predictors were used in our analyses: 1. the "Horvath multi-tissue predictor" (HMTP) [30] and 2. the "cortex clock" (CorCl) [31]. Since most other popular epigenetic clocks (Hannum [32], PhenoAge [33], GrimAge [34]) were calibrated for blood tissues, we did not include analyses of these age estimators in this study. For more details regarding the calculation of the epigenetic age estimates see Supplementary Material.

DNAm-mRNA correlation analyses
The normalized RNA-seq data for the selected CpG candidate genes (see above) were correlated to their corresponding DNAm signal, using the Spearman method in R's cor.test function. The resulting p-values were corrected for multiple testing using the Benjamini-Hochberg procedure (as implemented in R's p.adjust function). Note that for some loci more than one gene was annotated to specific CpGs; in these instances, multiple DNAm-mRNA correlation results were computed for the same CpG. For more details, see Supplementary Material.

EWAS of case-control status and Braak staging highlights five DMRs
In the EWAS analyses of all 665,796 CpG-probes that passed QC on the EPIC array, none of the CpGs reached the experiment-wide Bonferroni-corrected significance threshold α = 7.51E-08, neither in the analyses of AD case-control status ( Supplementary Fig. 2) nor Braak stage ( Supplementary Fig. 3). However, we note that several CpGs reached at least suggestive evidence of association with either phenotype (α = 1.00E-05; Supplementary Tables 1 and 2). Among these, cg25191519 showed a particularly strong association signal (p = 8.90E-06). This CpG is annotated to the genes SPG7 and RPL13, which were already reported in previous AD DNAm studies [8,[16][17][18]. Next, we used the DMP test statistics to assess the presence of DMRs, i.e., consecutive runs of differentially methylated probes which are aggregated into "regions". After adjustment for multiple testing, five DMRs (near genes PRKCZ, CYFIP1, ACOT7, COL4A1, IBA57, and C1orf69) showed significant association with AD case-control status ( Table 2). In contrast, no DMRs were found when combining DMPs from the Braak stage EWAS using p-comb. Two of the six genes highlighted by the significant case-control DMRs were previously described in the context of AD DNAm profiling studies. First, a DMR near CYFIP1 was reported by Li et al. to show association with Braak stage [16]. Second, the gene IBA57 is annotated to CpG-probe cg12461930, which also showed association with Braak stage in the crosscortex meta-analysis by Smith et al. [18].
To further evaluate the degree of correspondence of our novel EWAS data with those from the recent EWAS meta-analysis by Smith et al. [18], we repeated the Braak stage meta-analysis from two publicly available EC DNAm datasets used in previous AD EWAS ("London-1" and "London-2", with GEO accession numbers GSE59685 and GSE105109, respectively) [6,10], which were also included in the Smith et al. study [18], and used the resulting test statistics with varying p-value thresholds to calculate the PES. We then tested the PES for association with Braak stage in our dataset using linear models equivalent to the primary EWAS. These analyses revealed that the PES was, indeed, significantly associated with Braak stage in our dataset (7.83E-06 ≤ p ≤ 1.88E-01, Supplementary Table 3) and explained up to 12% of the phenotypic variance in our dataset. Overall, these results suggest that our dataset is equivalent in terms of data quality when compared to those previously published in the field.

EWAS meta-analysis highlights 12 DMPs showing experiment-wide significant association with AD
To increase power, we combined the EWAS results from our samples with those from the two publicly available EC DNAm datasets London-1 and London-2. Of note, the DNAm data of these prior studies were generated on the predecessor 450K DNAm array (Illumina, Inc.) which has a substantially lower resolution leading to a smaller number of meta-analysed CpGs. The metaanalysis across all three datasets comprised 320 samples for the AD case-control analysis, and 337 samples for the Braak stage analysis. Overall, there were 304,996 overlapping CpGs available for this meta-analysis, resulting in an experiment-wide significance threshold of α = 1.64E-07 for this arm of our study. Using this threshold, five CpGs in the AD case-control meta-analysis ( Fig. 1, Table 3), and nine CpGs in the Braak stage meta-analysis ( Fig. 2, Table 4), reached experiment-wide significance. Importantly, four of these were not previously reported in the context of EWAS using AD Braak stage or case-control status as phenotypes and can be considered bona fide novel findings of our study. Two CpGs (cg03169557 [near the genes RPL13 and SPG7] and cg05066959 [near the genes NKX6-3, ANK1, and MIR486]) were significantly associated with both AD case-control status and AD Braak stage and were already described in previous DNAm AD studies [8,[16][17][18]. While none of the analyses in the individual datasets analyses showed notable inflation in the test statistics (Supplementary Table 4), both meta-analyses displayed slightly increased inflation (λ case-control = 1.16; λ Braak = 1.24; Supplementary Fig. 4), a relatively common observation in EWAS as already noted in Smith et al. [18].
The four newly associated CpG-probes cg03073402, cg22388948, cg20648333, and cg05228284 are located in or near the genes CNFN/LIPE, TENT5A, PALD1/PRF1, and DIRAS1, respectively. The implicated CpGs all displayed a reduction of DNAm associated with AD, i.e., a negative effect size estimate. Of these genes, DIRAS1 shows the most pronounced expression in brain (GTEx V8) followed by LIPE, TENT5A, and PALD1. In contrast, CNFN and Table 2 Results of EWAS using AD case-control status in Oxford sample AD case-control DMRs identified using p-comb software. CpGs number of CpGs in each DMR, p adj p-value after adjustment for multiple testing based on the Sidak method a Evidence for implication of same or largely overlapping locus from studies using DNAm assessments in AD-related phenotypes. More (p < 1.00E-05) results from this analysis can be found in Supplementary Table 1  PRF1 do not show any noteworthy expression in the brain tissues analysed in GTEx. Using a PubMed search (using "{gene name} AND alzheimer*} as search terms) revealed that no publication exists to date directly linking any of these genes to AD. Look-up of the CpG IDs on the EWAS Catalog (http:// www. ewasc atalog. org/) [35] revealed that cg03073402 (CNFN/LIPE) and cg20648333 (PALD1/PRF1) were previously reported to be associated with aging from birth to late adolescence in blood samples [36].
The GWAS catalog (https:// www. ebi. ac. uk/ gwas/ [37]) also revealed no noteworthy AD-related entries for CNFN, LIPE, PALD1, PRF1, or DIRAS1. In contrast, genetic variants in TENT5A (a.k.a. FAM46A) have been found associated by GWAS with a number of traits (https:// www. ebi. ac. uk/ gwas/ genes/ TENT5A), some of them with direct relevance for AD, e.g. "Alzheimer's disease, posterior cortical atrophy", "Alzheimer's disease, cognitive decline measurement", "PHF-tau measurement", "neurofibrillary tangles  Table 3 Results of EWAS meta-analysis using AD case-control status Experiment-wide significant CpGs (p < 1.64E-07) in the meta-EWAS ascross three datasets (London-1, London-2, Oxford) using AD case-control status a Evidence for implication of same or largely overlapping locus from studies using DNAm assessments in AD-related phenotypes. Annotation of CpGs to specific gene regions was based on the Illumina manifest (v1.0 B5) for the EPIC array and the GREAT annotation tool [28]  measurement", and "temporal pole volume measurement", underscoring the potential mechanistic involvement of this gene in AD pathogenesis. The EWAS signal for TENT5A was elicited by CpG-probe cg22388948 (located intronically) with a p-value of 5.83E-08 in the AD case-control meta-analysis. Suggestive evidence for association with AD case-control status with the same effect direction could also be observed in the individual London datasets (p L1 = 6.73E-05, effect L1 = -0.0375, p L2 = 1.72E-02, effect L2 = -0.0174), emphasizing the robustness of the   Tables 3 and 4, Supplementary Table 5). Interestingly, three of the eight CpG probes that were previously reported to show experiment-wide significant association with Braak stage in EC [18] and were also present in our analysis did not show any evidence of association with Braak stage (p < 0.05) in the Oxford dataset (Supplementary Table 6). This relates to CpG probes cg04523589 (annotated to the gene CAMP), cg06653632 (annotated to SLC15A4 and TMEM132C), and cg11563844 (annotated to STARD13 and KL). In contrast, the other five CpGs (annotated to genes SPG7, ANK1, MIR486, MYO1C, ABR, ALDH16A1, and FLT3LG) showed independent evidence of association in our dataset (p-values ranging from 0.05 to 7.38E-04; Supplementary Table 6) with consistent effect directions and can therefore be regarded as an independent replication of the results of Smith et al. [18]. We also looked up the results from a recent EWAS by Piras et al. [9], which was comparable to our study in many aspects albeit using middle temporal gyrus as source of brain tissue. In addition to the ANK1, MIR486, and MYO1C results discussed above, our EWAS showed at least nominal support for 12 additional loci proposed by Piras et al. (Supplementary  Table 7).

Half of the DNAm EWAS signals correlate with gene expression
To elucidate the potential functional implications of the DNAm associations highlighted above, we performed correlation analyses between the DNAm levels and corresponding mRNA expression data generated in the same individuals from the same tissue slices. To this end, we chose all 12 significantly associated CpGs from the EWAS meta-analysis results (i.e. both DMPs [ Table 3, Table 4] and DMRs [ Table 2]), as well as the eight available AD-associated CpGs in EC from Smith et al. [18] and correlated the DNAm levels with gene expression levels of the annotated gene(s) according to the Illumina manifest (v1.0 B5) for the EPIC array and the GREAT annotation tool. Within DMR "windows", we selected the CpG showing the strongest association with AD case-control status for correlation with mRNA levels. Overall, this led to Spearman rank correlations of 39 DNAm-mRNA pairs (Supplementary Table 8). Ten of these pairs (with eight unique CpGs, two of which were CpGs from the Smith et al. EC meta-analysis [18]) showed evidence for a statistically significant DNAm-mRNA correlation after multiple testing correction (  Tables 9 and 10). In addition, using the genes with significant correlations as outcome in differential gene expression analyses performed on the same RNAseq data revealed that all ten were also significantly differentially expressed with respect to both AD case-control status and Braak stage (Table 5). One additional CpG-mRNA pair, which only showed a borderline negative correlation here (cg05972352 vs. ENSG00000126217; rho = -0.15, P-value = 0.061) was reported to show a significant correlation in the same direction in the study by de Witte et al., 2021 [38].
Generally, the correlation coefficients only indicated moderate (maximal rho = -0.35), but statistically significant correlations between DNAm and mRNA expression in this dataset. The comparatively moderate extent of the correlations likely reflects the fact that gene expression is regulated by a number of other (epi-)genetic mechanisms Table 5 Experiment-wide significant Spearman rank correlations between CpG DNAm and gene expression levels Experiment-wide significant Spearman rank correlations between CpG DNAm and gene expression levels. Β Effect sizes of gene expression association with AD casecontrol status / Braak stage, P P-value of gene expression association with AD case-control status / Braak stage after multiple testing adjustment with the Benjamini Hochberg method; Location: Location of the CpG in the genome with respect to the correlated gene; Results of all 39 tested DNAm-mRNA pairs can be found in Supplementary Tables 8, 9, and 10. One additional CpG-mRNA pair was found to significantly correlate in human brain samples by de Witte et al., 2021 [38]; in our data, this pair showed a borderline significant correlation in the same direction as described by ref. [38] (see Supplementary Table 8). CpGs are ordered alphabetically. For some loci more than one gene was annotated to specific CpGs; in these instances, multiple DNAm-mRNA correlation results were computed for the same CpG beyond DNAm [39]. Another noteworthy observation is that the signs of correlation coefficients of significant DNAm-mRNA pairs were both positive and negative, suggesting a complex relationship between DNAm status at these positions and their effect on mRNA expression. This is likely due to the fact that the majority of CpGs (nine out of ten) among the significantly correlated DNAm-mRNA pairs was located in gene bodies or distal to the stop-codon, while only one (CpG cg05228284) was located in an CpG-island (CGI) in the 5' untranslated region (UTR). Classically, DNAm is considered a mark of transcriptional repression (here expected to elicit a correlation with a negative sign), however, this only applies to CpGs located in promoter CGIs, not necessarily those located elsewhere [40]. Therefore, the presence of both positive and negative correlations between DNAm and mRNA levels ( Table 5) is not unexpected.

Epigenetic age acceleration is associated with AD in EC
In agreement with prior evidence, both DNAm age estimators were highly correlated with chronological age (HMTP: Pearson's r = 0.56, p = 1.078E-13; CorCl: r = 0.81, p < 2.20E-16) in our dataset. However, CorCl showed a (much) stronger correlation with chronological age compared to HMTP, and also did not show the tendency to under-estimate epigenetic ages compared to chronological age (Fig. 3).
Using these estimates, we determined the degree of "age acceleration" which was defined as the residual from a linear regression of DNAm age on chronological age [41]. This estimate was probed for an association with either AD case-control status or Braak stage using linear regression. Our expectation was that samples with an advanced disease state (e.g., AD vs. control, or high Braak stage vs. low Braak stage) would also show a more pronounced age acceleration (i.e., older epigenetic age when compared to chronological age). In concordance with this expectation, we found that age acceleration estimates were, indeed, associated with disease state (2.72E-05 ≤ p ≤ 1.96E-04, Table 6), with higher age acceleration being associated with AD cases or higher Braak stages (Table 6). In additional analyses, we ran regression models similar to those for the primary EWAS, i.e. accounting for genetic ancestry as well as unknown confounders with respect to the DNAm data. After including   Table 11). Of note, DNAm PC1 and PC2 showed consistent associations with disease state, and therefore likely capture variance in DNAm data that reflect unknown confounders in the dataset (Supplementary Table 11), with the true level of association likely located somewhere between both models.

Discussion
In this study we performed various EWAS analyses on a large collection of DNAm profiles generated in human EC tissue samples. Meta-analysis of newly generated DNAm data with those from two previous EC-based EWAS provides evidence for four novel loci showing significant association with either AD casecontrol status or Braak stage after stringent multiple testing correction. Using RNAseq data generated from the same individuals/tissue samples, we identified significant correlations between DNAm levels and mRNA expression for 10 out of the 39 DNAm-mRNA pairs within these EWAS loci. One additional locus (MCF2L [ENSG00000126217]) which only showed borderline significance here was recently reported to correlate with DNAm at cg05972352 in the study by de Witte et al. [38] and can arguably be counted as one additional relevant DNAm-mRNA pair in the context of our analyses. The most notable of our novel associations was observed with a CpG-probe (CpG cg22388948) in TENT5A (a.k.a. FAM46A), which not only showed consistent effect directions across all three analysed datasets, but also exhibited a significant (negative) correlation with mRNA levels of the same gene. TENT5A represents a promising novel AD candidate gene due to its previous association with several AD-relevant phenotypes by GWAS. Functionally, it belongs to the nucleotidyltransferase (NTase) fold superfamily (FAM46), which serve as noncanonical poly(A) polymerases involved in the modification of cytosolic and/or nuclear RNA 3' ends and, hence, in the regulation of gene expression [42]. Regarding the other novel signals of our EWAS, we note that although no direct evidence exists linking either LIPE (encoding lipase E [a.k.a. hormone sensitive lipase {HSL}], near CpG cg03073402 on chromosome 19p) or PRF1 (encoding perforin 1; near CpG cg20648333 on chromosome 10q) to AD, both genes are involved in molecular pathways, i.e. lipid metabolism [43] and the immune system response [44], respectively, which are both highly relevant in AD pathogenesis based on recent GWAS data [45,46]. Furthermore, the EWAS catalog lists both CpGs are associated with "human aging" [35], which further emphasizes the potential relevance of these loci. The possible link to AD pathogenesis of the last novel CpGsite cg05228284, near DIRAS1, is less obvious. This gene is highly expressed in the cerebellum, cortex, and frontal cortex according to GTEx. It belongs to the Ras superfamily of monomeric GTPases and has been previously reported as a tumor suppressor and is annotated to gene ontology pathways such as GTPase activity, protein binding, and signal transduction [47].
Other relevant outcomes of our study are the independent confirmation of some, albeit not all, previous EC-based EWAS signals [18], and the observation that epigenetic age in EC is accelerated with increasing AD progression using two recently proposed estimators of DNAm age, a finding that is consistent with previously published data [48][49][50]. We also confirm previous results that the HMTP (a.k.a. "Horvath clock"), which was trained on several tissues, may not be ideal to estimate DNAm age in the human brain cortex, and that, instead, the recently proposed "cortex clock" may be better suited for DNAm analyses in this tissue [31,51]. One potential concern with these latter analyses is that the HMTP was originally derived using DNAm data from the 450 K array (as opposed to the EPIC array used here). However, we note that the overlap between the CpG-probes present on the 450 K and EPIC array is large, and it has previously been shown that the HMTP method provides reliable results when comparing data from both arrays [52].
The strengths of our study are its comparatively large sample size (n = 149 novel EC samples; n = 337 in the EWAS meta-analyses), the analysis of a brain region highly relevant for AD research (i.e. EC), the use of the hitherto highest resolution DNAm profiling microarray (i.e. the Methylation EPIC array featuring 850 K CpGprobes), and the parallel availability of RNAseq-based mRNA expression data (allowing for detailed DNAm-mRNA correlation analyses). Despite these strengths, our study is also subject to a number of limitations which include the following. First and foremost, we used "bulk tissue" for DNAm profiling and RNAseq. Bulk tissue samples represent an agglomerate of different cell-types whose proportions (and DNAm and mRNA profiles) may vary across different samples (e.g. they may change as the disease progresses), a situation that may have affected the outcomes of our study. Single nucleus-based DNAm profiling and RNAseq would be required to fully address this question, which was not feasible in the context of our study. The next-best approach is to estimate samplespecific cell-type proportions using cellular deconvolution (ideally based on reference data from the same tissue with known cell-type composition) and include these estimates into the analyses. To this end, we estimated cell-type composition based on DNAm reference data obtained from human prefrontal cortex samples (to the best of our knowledge, no DNAm reference data exist for EC) using the estimateCellCounts function in the R package Minfi [53] and regressed these out with the remove-BatchEffect function in the R package limma [54]. Using this approach did not appreciably change our results and none of our top signals changed in the AD case-control or Braak stage EWAS (Supplementary Tables 1 and 2). Second, our reanalysis of publicly available DNAm profiles from EC resulted in slightly increased (i.e. less significant) P-values when compared to those reported in the primary publications [18]. This is likely due to a more conservative covariate adjustment scheme used in our data processing workflow. Repeating all meta-analyses based on a covariate adjustment paradigm similar to that used in Smith et al. [18] (i.e. include DNAm covariates until λ drops to < 1.2 in each individual dataset) led to an overall increase in statistical significance of most of the meta-analysis results (Supplementary Table 12), approaching those previously reported. We further note that Smith et al. [18] did not adjust their results for genetic ancestry, which may have also affected their test statistics. Therefore, the "true" level of statistical support likely lies somewhere in between both approaches, and suggests that the EWAS results reported here can be considered conservative. Third, our meta-analysis is based on fewer CpGs (n = 304,996) as reported by the overlap of the London-1 and London-2 datasets (up to n = 403,763 [6,10]). This is due to the fact that we used a different DNAm microarray ("MethylationEPIC") here, which does not show perfect overlap in CpG-probes with the predecessor array ("450 K") and removed a comparatively large number of probes from all datasets during QC. Fourth, as previously noted [18], EWAS meta-analyses tend to show inflations of the test statistics, which we also observed here. One possible reason is heterogeneity of the effect estimates across studies due to technical reasons. To address this issue, we repeated the meta-analysis using random-effect models, which indeed showed a much less pronounced degree of inflation than fixed-effect models (Case-control: λ fixed-effect = 1.16, λ random-effect = 0.92; Braak stage: λ fixed-effect = 1.24, λ randomeffect = 1.00; Supplementary Fig. 5). However, we note that for only two out of the twelve experiment-wide significant DMPs, heterogeneity measured by the i 2 statistic exceeded 50% (Supplementary Table 13), supporting the appropriateness of using the fixed-effect models, as was done in other recent EWAS meta-analyses [18]. Fifth, we note that our samples displayed relatively low bisulphite conversion efficiency rates necessitating to lower the exclusion threshold from 80%, which is typically used, to 65% here. To ensure that the EWAS results obtained were not driven by artifacts due to low sample quality, we repeated our meta-analyses using 80% as threshold, which did not substantially change the results (Supplementary Table 14 Finally, we are not able to discern cause-effect relationships from our data. The observed epigenetic associations may reflect molecular changes preceding (and perhaps modifying) the underlying disease process, but they may also represent a consequence of the same. However, the inability to disentangle the sequence of events is not a limitation of our EWAS, but of any epigenetic study. Notwithstanding, our novel results imply several new loci and potential molecular mechanisms that are associated with AD and, as a result, provide important new insights to our understanding of the pathogenetic mechanisms underlying the disease process. Lastly, despite going to great lengths to minimize spurious associations by applying conservative QC thresholds in all steps of our data processing steps, we cannot exclude the possibility that some undetected confounding factors have impacted our results. However, the overall high concordance between our and previous EWAS results argues against a strong systematic bias specific to our data and/or results. Nevertheless, all novel results to emerge from this work should be considered preliminary until independent validation is reported.

Conclusions
In conclusion, in the largest AD EWAS performed on human EC samples to date, we identified a total of 12 epigenome-wide significant CpGs, four of which are novel. Six of these CpGs show significant correlations with corresponding mRNA levels in the same samples, highlighting their potential downstream effects on gene expression. Future work is needed to validate our findings and to clarify the role of the newly implied loci in AD pathogenesis.
Additional file 1: Supplementary Methods. S1. QQ-plots for the results of the AD case-control EWAS (A, λ=1.0005), and the Braak stage EWAS (B, λ=1.0175) in the Oxford datasets. S2. Manhattan plot for the results of the AD case-control EWAS in the Oxford dataset; the red line indicates the experiment-wide significance threshold of 7.51E-08, whereas the purple line indicates the suggestive significance threshold of 1.00E-05. CpGs with suggestively significant association are marked in purple and annotated with the gene name according to the Illumina manifest (v1.0 B5). NB that three CpGs (on chr. 8 and chr. 14 were not annotated to any genes). S3.
Manhattan plot for the results of the AD Braak stage EWAS in the Oxford dataset; the red line indicates the experiment-wide significance threshold of 7.51E-8, whereas the purple line indicates the suggestive significance threshold of 1.00E-05. CpGs with suggestively significant association are marked in purple and annotated with the gene name according to the Illumina manifest (v1.0 B5). NB that one CpG (on chr. 14 was not annotated to any genes). S4. QQ-plots for the results of the AD case-control meta-EWAS (A, λ=1.16), and the Braak stage meta-EWAS (B, λ =1.24) with fixed-effect models across three datasets (London-1, London-2, Oxford). S5. QQ-plots for the results of the AD case-control meta-EWAS (A, λ=0.92), and the Braak stage meta-EWAS (B, λ =1.00) with random-effect models across three datasets (London-1, London-2, Oxford). S6. Scree plots for the results of a principal component analysis (PCA) on DNAm using binned CpGs in the Oxford (A, n=149 samples), London-1 (B, n=104 samples) and London-2 (C, n=92 samples) datasets. For the main EWAS analyses reported in this paper we selected n=13, n=13, n=15 PCs for the Oxford, London-1, and London-2, datasets, respectively, to account for known and unknown confounders of DNAm in these samples.